<html>
<head>
  <!--#include virtual="header.html" -->
  <title>Problem Based Benchmark Suite : Suffix Arrays</title>
</head>

<body>
<!--#include virtual="navbar.html" -->
<div class=center>

<h2>Suffix Arrays (SA):</h2> 

<p> Given a string generate its <a
href="http://en.wikipedia.org/wiki/Suffix_array">suffix array</a> (the
sorted sequence of all suffixes of the input).
</p>

<h3>Input and Output File Formats</h3>

<p>
The input is an ascii string and the output is an integer sequence
in the <a
href="benchmarks/sequenceIO.html">sequence</a> format.  The integers
in the ouput represent locations in the input (0-based) and must
be in sorted order with respect to the lexicographic ordering of the
suffixes they point to.
</p>

<h3>Default Input Distributions</h3>

One of the inputs is synthetic and the other three are taken from real
sources.  The difference in weight given to these distributions is due
to the difference in input length.
<ul>
<li>
(20) A trigram string of length n=10,000,000.
<blockquote>
<tt>trigramString &lt;n&gt; &lt;filename&gt;</tt>
</blockquote>
</li>

<li>
(6) <tt>chr22.dna</tt> is a DNA sequence.  It consists only of the
characters C,G,C,A,N and has about 34 million characters.
</li>

<li>
(1) <tt>etext99</tt> is text from the project Guttenberg.  It has
about 105 Million characters.
</li>

<li>
(1) <tt>wikisamp.xml</tt> is a sample from wikipedia's xml source files.  It has
exactly 100 million characters.
</li>

</ul>

</div>
<!--#include virtual="footer.html" -->
</body>
<html>
